feat: multi-model adversarial harnesses - structural hardening - security sandbox - battle-tested results by lliWcWill · Pull Request #3 · coleam00/adversarial-dev

lliWcWill · 2026-04-02T16:33:30Z

Summary

Added two new harnesses: mixed-harness (Claude Opus generator, GPT-5.4 evaluator) and gemini-harness (Claude Opus generator, Gemini 3.1 Pro evaluator with sandboxed tool calling)
Hardened contract negotiation with 3-round iterative loop, fail-closed parsing, and mid-sprint renegotiation triggers
Built ConversationLogger that saves every agent exchange as Obsidian markdown and JSONL
Added 29 tests covering parseContract, renegotiation logic, parseEvalResult, and conversation logger
Secured the Gemini evaluator sandbox: command allowlisting, realpath symlink protection, git read-only, find -exec blocking, absolute path rejection
Battle-tested all four harnesses on the same real-world bug fix prompt

Results - first multi-model harness runs

Harness	Generator	Evaluator	Result	Time
claude-harness	Opus 4.6	Opus 4.6	5/5 PASSED	53.4 min
codex-harness	GPT-5.4	GPT-5.4	0/1 FAILED	59.6 min
mixed-harness	Opus 4.6	GPT-5.4	11/13 on Sprint 4	60+ min
gemini-harness	Opus 4.6	Gemini 3.1 Pro	5/5 PASSED	50.7 min

Cross-model evaluation caught bugs that self-evaluation missed:

GPT-5.4 flagged incomplete OAuth token refresh that Claude gave itself a pass on
GPT-5.4 caught REPL token display showing zero after completed responses
Gemini ran the full test suite via tool calling before scoring, Sprint 4 failed first attempt

Structural fixes to harness logic

Contract negotiation is now iterative: generator proposes, evaluator reviews, up to 3 rounds of back-and-forth before finalizing
parseContract throws on malformed JSON instead of falling back to defaults - fail-closed, not fail-open
Retry loop triggers renegotiation when avgScore drops below 4 or all criteria are failing
Division-by-zero guard on empty feedback arrays
APPROVED check is now case-insensitive startsWith instead of fragile exact-match

Gemini evaluator sandbox - 5 layers of defense

Command allowlist: no code execution binaries - node, bun, npm removed entirely
Git restricted to read-only subcommands: log, status, diff, show, ls-files, rev-parse
Dangerous flag blocking: find -exec, -execdir, -delete rejected
Path confinement: absolute paths outside workspace rejected for all commands
Symlink resolution: realpath prevents symlink-based escape from workspace

New files

mixed-harness/ - 5 files: Claude generates code, Codex GPT-5.4 evaluates
gemini-harness/ - 5 files: Claude generates code, Gemini 3.1 Pro evaluates with tool calling
shared/conversation-logger.ts - Obsidian markdown and JSONL transcript export
tests/mixed-harness.test.ts - 22 tests
tests/conversation-logger.test.ts - 7 tests
RESULTS.md - full battle report with scores, timings, analysis
examples/gemini-run-excerpt.md - sample ConversationLogger output

Test plan

bun test - 29 tests passing
Claude harness: 5/5 sprints passed on real-world prompt
Gemini harness: 5/5 sprints passed on same prompt
Codex harness: confirmed failure mode documented
Mixed harness: 11/13 criteria passing on Sprint 4, adversarial evaluation working as designed
Verify mixed harness completes full run
Run harnesses on a second prompt to confirm generalization

1. Iterative negotiation: negotiateContract now runs up to 3 rounds of generator→evaluator back-and-forth instead of single-pass. Generator counter-proposes based on evaluator feedback until APPROVED. 2. Fail closed on malformed contracts: parseContract throws instead of silently falling back to generic 3-criterion default. Caller retries negotiation up to 2 times before propagating the error. 3. Renegotiate on bad criteria: retry loop now detects when all criteria are failing (avg score < 4 or all below threshold) and triggers contract renegotiation mid-sprint instead of burning retries against impossible criteria. Applied to both claude-harness and codex-harness.

CodeRabbit review caught two issues: 1. Empty feedback array → division by zero → NaN avgScore 2. allFailing=true branch logged but never renegotiated (try block was inside else-if only) Fix: add feedback.length > 0 guard, restructure to outer condition gates renegotiation with inner if/else for accurate log messages.

…ation logger New mixed-harness/ — cross-model adversarial dev inspired by GAN architecture: - Generator (Claude Opus 4.6) builds code against sprint contracts - Evaluator (Codex GPT-5.4) rips apart the work in fresh context - Zero sycophancy: evaluator has no emotional investment in the code Includes all 3 hardening fixes from parent harnesses: - Iterative contract negotiation (3 rounds) - Fail-closed contract parsing (throws on garbage) - Mid-sprint renegotiation when all criteria fail ConversationLogger (shared/conversation-logger.ts): - Captures every agent prompt, response, tool call, score, and error - Saves as Obsidian-friendly markdown (.md) + machine-readable JSONL - Default output: agent-brain-vault/Projects/brane-code/debates/ - Collapsible tool calls, score badges, duration tracking Tests (29 passing): - parseContract: 9 tests (fail-closed, code blocks, garbage rejection) - Renegotiation trigger: 7 tests (thresholds, division-by-zero guard) - parseEvalResult: 3 tests (threshold recalculation, extraction) - Negotiation rounds: 3 tests (early approval, max rounds) - ConversationLogger: 7 tests (entries, markdown, JSONL, disk save)

New gemini-harness/ — cross-company adversarial dev: - Generator: Claude Opus 4.6 (Anthropic) builds code via Agent SDK - Evaluator: Gemini 3.1 Pro Preview (Google) rips it apart via @google/genai - Gemini evaluator has tool calling: readFile, runCommand, listFiles - Multi-turn chat loop handles tool calls until Gemini is done evaluating - 1M context on BOTH sides — true heavyweight matchup Same 3 hardening fixes as other harnesses: - Iterative contract negotiation (3 rounds) - Fail-closed contract parsing - Mid-sprint renegotiation on bad criteria Includes ConversationLogger integration for full transcript logging. SDK: @google/genai@1.48.0 Model: gemini-3.1-pro-preview (1M input, 65K output)

- Gemini evaluator: sandbox tool handlers with path confinement, command allowlisting (execFileSync instead of execSync), and fs.readdir instead of shell-interpolated find - Fix fragile "APPROVED" exact-match in all 4 harness negotiation loops — now case-insensitive startsWith - Remove personal vault path from logDir defaults (use ./logs) - Remove brane-streaming-fix.md (task doc, not project code) - Remove unused imports in mixed/gemini harnesses - Fix copy-paste comment ("Codex" -> "Gemini" in gemini harness)

- Gemini sandbox: use realpath() instead of resolve() to prevent symlink-based path traversal escape - Gemini sandbox: reject runCommand args with absolute paths outside workspace (prevents `cat /etc/passwd`, `grep -r secret /etc`) - All evaluators: guard against empty feedback array where [].every() returns true — prevents silent false pass - ConversationLogger: empty JSONL returns "" not bare newline

Security (Gemini evaluator sandbox): - Remove node/npm/npx/bun/bunx from runCommand allowlist — prevents arbitrary code execution via `node -e "..."` - Restrict git to read-only subcommands (log, status, diff, show, ls-files, rev-parse) — prevents data exfiltration via git push - Block find -exec/-execdir/-delete flags — prevents subprocess spawn - Keep path containment (realpath + absolute path rejection) from previous commit Polish: - Fix wrong evaluator name in gemini generator prompt ("Codex" -> generic) - Remove unused readContract import from claude-harness - Fix README: wrong default model (sonnet -> opus), add mixed/gemini harness sections to Quick Start and Project Structure

RESULTS.md — full scoreboard, key findings, and analysis from 4 harness runs (Claude 5/5, Codex 0/1, Mixed 11/13 on S4, Gemini 5/5). examples/gemini-run-excerpt.md — first 150 lines of actual Gemini evaluator conversation log showing the ConversationLogger output format.

lliWcWill added 8 commits April 2, 2026 06:13

romanstark mentioned this pull request Apr 9, 2026

Feat/contract negotiation hardening #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: multi-model adversarial harnesses - structural hardening - security sandbox - battle-tested results#3

feat: multi-model adversarial harnesses - structural hardening - security sandbox - battle-tested results#3
lliWcWill wants to merge 8 commits intocoleam00:mainfrom
lliWcWill:adversarial-dev-hardening

lliWcWill commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lliWcWill commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results - first multi-model harness runs

Structural fixes to harness logic

Gemini evaluator sandbox - 5 layers of defense

New files

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lliWcWill commented Apr 2, 2026 •

edited

Loading